Mining Extremely Skewed Trading Anomalies

نویسندگان

  • Wei Fan
  • Philip S. Yu
  • Haixun Wang
چکیده

Trading surveillance systems screen and detect anomalous trades of equity, bonds, mortgage certificates among others. This is to satisfy federal trading regulations as well as to prevent crimes, such as insider trading and money laundry. Most existing trading surveillance systems are based on hand-coded expert-rules. Such systems are known to result in long developing process and extremely high “false positive” rates. We participate in co-developing a data mining based automatic trading surveillance system for one of the biggest banks in the US. The challenge of this task is to handle very skewed positive classes (< 0.01%) as well as very large volume of data (millions of records and hundreds of features). The combination of very skewed distribution and huge data volume poses new challenge for data mining; previous work addresses these issues separately, and existing solutions are rather complicated and not very straightforward to implement. In this paper, we propose a simple systematic approach to mine “very skewed distribution in very large volume of data”.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of estimation methods on fractal modeling for anomalies’ detection in the Irankuh area, Central Iran

This study aims to recognize effect of Ordinary Kriging (OK) and Inverse Distance Weighted (IDW) estimation methods for separation of geochemical anomalies based on soil samples using Concentration-Area (C-A) fractal model in Irankuh area, central Iran. Variograms and anisotropic ellipsoid were generated for the Pb and Zn distribution. Thresholds values from the C-A log-log plots based on the e...

متن کامل

Investigation of linear and non-linear estimation methods in highly-skewed gold distribution

The purpose of this work is to compare the linear and non-linear kriging methods in the mineral resource estimation of the Qolqoleh gold deposit in Saqqez, NW Iran. Considering the fact that the gold distribution is positively skewed and has a significant difference with a normal curve, a geostatistical estimation is complicated in these cases. Linear kriging, as a resource estimation method, c...

متن کامل

Develando estrategias de mercado: minería de datos aplicada al análisis de mercados financieros

It has become increasingly common to model financial markets using frameworks which better capture their behavior than the excessively simplistic traditional frameworks. Key concepts in these new frameworks are evolution, complex systems and data mining, each with their associated characteristic analysis. In particular, data mining provides extremely useful tools for potentially extracting know...

متن کامل

Mining Data Streams with Skewed Distribution based on Ensemble Method

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well skewed (e.g., few positives but lots of negatives) and skewed distributions, which are typical in many data stream applications. In this paper, we propose an ensemble and cluster based sample method...

متن کامل

Stock Data Mining through Fuzzy Genetic Algorithm

Stock data mining such as financial pairs mining is useful for trading supports and market surveillance. Financial pairs mining targets mining pair relationships between financial entities such as stocks and markets. This paper introduces a fuzzy genetic algorithm framework and strategies for discovering pair relationship in stock data such as in high dimensional trading data by considering use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004